Generic endpoint is on port 9200
/ # to get ElasticSearch version
/_cat/health?v
/_cat/nodes?v
/_cat/indices?v
/_cluster/health
/_cluster/state # gives elected master and shard initialisation status
/_cluster/stats
/_cluster/settings
Further introspection:
/_nodes/
/_nodes/process
/_nodes/settings
/_nodes/stats
/_aliases
/_warmers
/_mappings
GET /_cat/indices?v
GET /<index name>?pretty
PUT /<index name>
DELETE /<index name>
GET /_settings # Print config for all indices
Copying indices using "reindex": It is possible to copy indices partially/fully from local as well as from remote indices:
POST /_reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"username": "user",
"password": "pass"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
Endpoints for index aliases are quite messy
GET /_aliases?pretty
POST /_aliases
{
"actions" : [
{ "add" : { "index" : "<index>-000001", "alias" : "my-<index>-alias" } }
]
}
DELETE /{index}/_alias/{name}
Trigger index rollover
POST /<alias>/_rollover
{
"conditions": [
"max_age": "3d",
"max_docs": 1000000,
"max_size": "30g"
]
}
List unassigned shards
curl -s "<server>:9200/_cat/shards?v" | grep -E "UNASSIGNED|prirep"
Get info when shards are not allocated
GET /_cluster/allocation/explain
Retry allocation of shards (after retry limit reached)
GET /_cluster/reroute?retry_failed=true
curl -H 'Content-Type: application/json' -XPOST $ELASTIC_URL/_cluster/reroute?retry_failed=true
Manual recovery
curl -H 'Content-Type: application/json' -XPOST $ELASTIC_URL/_cluster/reroute -d '{
"commands": [ {
"allocate_replica": {
"index": "logstash-2019.10.13",
"shard": 0,
"node": "instance-0000000009"
}
} ]
}'
GET /<index name>/external/1?pretty
# Insert/Replace
PUT /<index name>/external/1
{ 'key': 'value' }
# Update
POST /<index name>/external/1
{ "doc": { 'count': 5 } }
POST /<index name>/external/1
{ "script": "ctxt._source.count += 1" }
DELETE /<index name>/external/1
DELETE /<index name>/external/_query
{ "query": { "match": { 'key': 'value' } }
Batch processing
POST /<index name>/external/_bulk
{"index":{"_id":"1"}}
{"key1": "value1"}
{"index":{"_id":"2"}}
{"key2": "value2"}
{"update":{"_id":"3"}}
{"doc": { "key3": "value3" }
{"delete":{"_id":"4"}}
[...]
Just a simple search example to explain query building
GET /<index name>/external/_search?q=*
POST /<index name>/external/_search
{
"query": { "match": { "field1": "abcdef" } },
"sort": { "balance": { "order": "desc" } },
"from": 10,
"size": 10,
"_source": ["field1", "field2"]
}
This is a short compilation on how to do rollover deployments of any type of configuration (e.g. changes in location awareness) in an Elasticsearch cluster that require a restart of all data nodes. It also applies to version upgrades.
Obviously it is not a good idea to restart all nodes at once, so a sequential rollover is what we are looking for. As the cluster is in operation we want to limit restart-caused reallocation and rebalancing as much as possible.
Note this is a summary derived from the official docs.
For each node do:
- Disable Shard Allocation
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": "none" } } ' - Sync shards to disk. To reduces the probability that local shards have
to be reinitialized after restart.
curl -X POST "localhost:9200/_flush/synced"
- Stop all instances. E.g.
systemctl stop elasticsearch.service
- Reenable Shard Allocation
curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d' { "persistent": { "cluster.routing.allocation.enable": null } } ' - Wait for node recover
while [ 1 ]; do curl -s -XGET "localhost:9200/_cat/health" | grep green && break; sleep 5; done
And finally you might want to trigger a rebalance to optimize distribution. In most rollover cases where you change the shard allocation awareness the restart of the last nodes automatically trigger the rebalancing as the distribution mismatch reaches the trigger threshold.
In case the rebalancing/reallocation becomes to slow you might want to temporarily increase the concurrency settings for those.
- Index retention: Curator
- Webadmin: Cerebro
- Auth: XPack Security (previously "Shield"), SearchGuard
- Alerting: Elastalert, Logagent, Sentinl
- Monitoring:
- by Elastic: Marvel, XPack
- Viki 2015
- Ingest: 25k/s Access Logs
- haproxy as Logstash LB
- Logstash single-threaded filters, 4 Nodes (8 CPU, 16GB)
- Logstash Forwarder Client with buffer log
- Elasticsearch:
- 20 Nodes (12 i7-3930k, 64GB, 3TB RAID0)
- 20 shards, 4 replicas
- 30GB heap
- Meltwater 2018
- Search Volume: 3k/min complex search requests
- Index Size: 3*10^6 articles, 100*10^6 social posts, 200TB
- Elastischsearch:
- 430 data nodes: i3.2xlarge, 64GB RAM
- 3 master nodes
- 40k shards, 100MB cluster state!
- 26GB heap
- Etsy 2016
- Index Size: overall 1.5PB
- Ingest: 10^9 loglines/day, 400k/s peak
- Elasticsearch:
- 6 clusters, 141 Nodes (overall 4200 CPU Cores, 36TB)
- codecentric.de Tuning Hints
- hipages Engineering - Scaling ES
- Scaling on index size (metrics: documents per shard, documents per node)
- Change shards to trade search response time for search concurrency
- Change nodes to trade resilience for memory usage
- Scaling on search time and througput
- Scaling on index size (metrics: documents per shard, documents per node)
- Evolution of an ELK Setup
- ELK with 1 Logstash
- ELK with loadbalanced horizontally scaled Logstash
- Kafka in front of logstash to buffer spikes ELK
- Separation of client, data and master Elasticsearch nodes
- Determining the Number of Shards
- Tuning Logstash
Note: credits for all those go to the post above. Consider this a compilation for ES begiinners.
- Set CPU scaling governor 'performance'
- Use SSDs with RAID 0
- Use HTTP transport protocol
- Change default mapping
- Avoid raw fields
- or make raw field "not_analyzed"
- Disable transparent huge pages
- Disable numad
- Disable swap, lock memory with bootstrap.mlockall: true
- Do not optimize JVM settings for max memory usage!
- Try to live with 4GB heap
- Ensure not to give more than 30GB RAM (sometimes only as much as 26GB) as JVM heap address compression stops with larger RAM
- Check heap address mode by running with -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode and if you see "zero based Compressed Oops" you are fine
- Check your heap usage curve. If you have a sawtooth give back the memory to the FS cache.
- When profiling
- check for >15% ParNewGC
- check SerialGC pauses
- ensure you do not have the G1 garbage collector active
- Logstash:
- On HW consider turning off hyperthreading
- Increase flush_size
- Increase idle_flush_time
- Increase output workers
- Finally increase pipeline batch size
- Avoid split-brain by setting discovery.zen.minimum_master_nodes
- Monitor fielddata cache to avoid running in OOM killing your cluster
- Logstash Pipeline Monitoring using XPack + Kibana